binary search - UK secondary Computer Science

A linear search works, and is the best that we can do (in the GCSE) if our data is unsorted. However, if we have sorted data then a binary search is (on average) much more efficient. This is because instead of starting with the first item, checking if it's the one we want, and then proceeding onto the next item, we can instead start in the middle, see if the item we want is higher or lower than the element in the middle, and then use that to refine our search.

Let's assume we have this data:

[
	{
		"name": "Bertrand",
		"photo_count": 8846
	},
	{
		"name": "Boyd",
		"photo_count": 939
	},
	{
		"name": "Carlos",
		"photo_count": 204
	},
	{
		"name": "Cielo",
		"photo_count": 26
	},
	{
		"name": "Clementina",
		"photo_count": 3111
	},
	{
		"name": "Conor",
		"photo_count": 378
	},
	{
		"name": "Demarcus",
		"photo_count": 42923
	},
	{
		"name": "Eli",
		"photo_count": 290
	},
	{
		"name": "Enrico",
		"photo_count": 338
	},
	{
		"name": "Eula",
		"photo_count": 853
	},
	{
		"name": "Fernando",
		"photo_count": 84567
	},
	{
		"name": "Jovanny",
		"photo_count": 17909
	},
	{
		"name": "Maya",
		"photo_count": 3372
	},
	{
		"name": "Minnie",
		"photo_count": 13
	},
	{
		"name": "Morton",
		"photo_count": 16658
	},
	{
		"name": "Raymond",
		"photo_count": 22
	},
	{
		"name": "Saul",
		"photo_count": 3065
	},
	{
		"name": "Selena",
		"photo_count": 314
	},
	{
		"name": "Sigrid",
		"photo_count": 40
	},
	{
		"name": "Thora",
		"photo_count": 170
	}
]

number of items = 20

We want find out how many photos "Morton" has. This is sorted data (in order to determine the order of the names, we use a similar algorithm to that of the dictionary – start with the first letter and see which one comes first in the alphabet; if they are the same letter, move onto the next letter and repeat) and all the names are unique (so we cannot have a collision).

The mechanism for a binary search is not too complicated:

pick the middle item
check if the middle item is greater or smaller than the item we're looking for (if it is, then we stop searching and return the value)
pick either the lower half or the upper half of the list (lower half if the middle item is higher than the item we want, upper half if the middle item is lower than the one we want)
select the middle item of that list
repeat (unless we've reached the end)

Consider how we'd do that for the list above:

we start by selecting the middle item, i.e. "Eula" (don't forget – list indices start from zero in most programming languages). If we have an even number of items, we don't actually have a middle of the list – both the 10th and 11th (not zero-indexed) items of the list could be the middle. Here I've chosen to always use the one that comes first (so we round down from the median, which is the (n+1)/2th item).
We compare "Eula" to "Morton". As "Eula" begins with "E" and "Morgan" with "M" we conclude that "Eula" comes before "Morton". This means that we can ascertain that "Morton" is definitely in the top half of the list.
We now have this data

[
  {
    "name": "Fernando",
    "photo_count": 84567
  },
  {
    "name": "Jovanny",
    "photo_count": 17909
  },
  {
    "name": "Maya",
    "photo_count": 3372
  },
  {
    "name": "Minnie",
    "photo_count": 13
  },
  {
    "name": "Morton",
    "photo_count": 16658
  },
  {
    "name": "Raymond",
    "photo_count": 22
  },
  {
    "name": "Saul",
    "photo_count": 3065
  },
  {
    "name": "Selena",
    "photo_count": 314
  },
  {
    "name": "Sigrid",
    "photo_count": 40
  },
  {
    "name": "Thora",
    "photo_count": 170
  }
]

We pick the middle item in this list – i.e. "Morton" (there ten items, so I've picked the fifth one – but we could have picked the sixth)
We have a match!

One thing that is interesting to consider is how many items we have to look up with a binary vs a linear search.

With a binary search we had to look at two items
With a linear search we would have had to look at 15 items (assuming we started from the front – six if we started from the back)

Yes, I know that this list turned out to be quite convenient, which is why in general we look at the average performance of algorithms and their worst-case performance when we select them. We can consider the differences between these two algorithms mathematically.

For a binary search of a list containing 20 elements (assuming we always pick the value that comes first) once we pick element 9, we have conducted one lookup. Assuming the worse case (i.e. that we have to search the 10 element list, from 11-20 inclusive) we then select element five. In this case, we have done two lookups. Assuming that we then have to search the larger list again, we now search a five element list. We pick the middle item of that list – so we have now made three lookups. In the worst case, we now search a list of two items, so by the time we have searched the whole (sorted) list, we have looked at a maximum of five items.

In contrast, in the worst case linear search we have to look at an entire 20 items. Think of all the power savings you could get with a binary search!

You can also have a look at some further information on binary vs linear searching.